This notebook performs a basic overview of the data and analyses the genetic diversity indicators. It uses as input the “clean kobo output” that was first cleaned by 1.2_cleaning.

Get data and functions

Load required libraries:

Load required functions. These custom fuctions are available at: https://github.com/AliciaMstt/GeneticIndicators

Other custom functions:

Custom colors:

Get indicators data from clean kobo output

# Get data:
kobo_clean<-read.csv(file="kobo_output_clean.csv", header=TRUE)

# Extract indicator 1 data from kobo output, show most relevant columns
ind1_data<-get_indicator1_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind1_data[,c(1:3, 12:14)])
# Extract indicator 2 data from kobo output, show most relevant columns
ind2_data<-get_indicator2_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind2_data[,c(1:3, 9:10,13)])
# Extract indicator 3 data from kobo output, show most relevant columns
ind3_data<-get_indicator3_data(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(ind3_data[,c(1:3, 9:11)])
# extract metadata, show most relevant columns
metadata<-get_metadata(kobo_output=kobo_clean)
## [1] "the data already contained a taxon column, that was used instead of creating a new one"
head(metadata[,c(1:3, 12, 25,26, 64)])
# save processed data
write.csv(ind1_data, "ind1_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(ind2_data, "ind2_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(ind3_data, "ind3_data.csv", row.names = FALSE, fileEncoding = "UTF-8")
write.csv(metadata, "metadata.csv", row.names = FALSE, fileEncoding = "UTF-8")

General description of the dataset

Methods to define populations

The methods used to define populations come from a check box question were one or more of the following categories can be selected: genetic_clusters, geographic_boundaries, eco_biogeo_proxies, adaptive_traits, management_units, other. As a consequence any combination of the former can be possible. Leading to the following results:

## 
##                                                                            adaptive_traits 
##                                                                                          3 
##                                                           adaptive_traits management_units 
##                                                                                         20 
##                                                                         eco_biogeo_proxies 
##                                                                                         37 
##                                                         eco_biogeo_proxies adaptive_traits 
##                                                                                          2 
##                                                        eco_biogeo_proxies management_units 
##                                                                                          5 
##                                                                   eco_biogeo_proxies other 
##                                                                                          3 
##                                                                           genetic_clusters 
##                                                                                        101 
##                                                           genetic_clusters adaptive_traits 
##                                                                                          4 
##                                                        genetic_clusters eco_biogeo_proxies 
##                                                                                         20 
##                                        genetic_clusters eco_biogeo_proxies adaptive_traits 
##                                                                                          3 
##                       genetic_clusters eco_biogeo_proxies adaptive_traits management_units 
##                                                                                          2 
##                                       genetic_clusters eco_biogeo_proxies management_units 
##                                                                                          1 
##                                                     genetic_clusters geographic_boundaries 
##                                                                                         71 
##                                     genetic_clusters geographic_boundaries adaptive_traits 
##                                                                                          4 
##                                  genetic_clusters geographic_boundaries eco_biogeo_proxies 
##                                                                                          7 
##                  genetic_clusters geographic_boundaries eco_biogeo_proxies adaptive_traits 
##                                                                                          1 
## genetic_clusters geographic_boundaries eco_biogeo_proxies adaptive_traits management_units 
##                                                                                          1 
##                 genetic_clusters geographic_boundaries eco_biogeo_proxies management_units 
##                                                                                          1 
##                                    genetic_clusters geographic_boundaries management_units 
##                                                                                          8 
##                                                          genetic_clusters management_units 
##                                                                                          6 
##                                                                     genetic_clusters other 
##                                                                                          2 
##                                                                      geographic_boundaries 
##                                                                                        254 
##                                                      geographic_boundaries adaptive_traits 
##                                                                                         30 
##                                     geographic_boundaries adaptive_traits management_units 
##                                                                                         12 
##                               geographic_boundaries adaptive_traits management_units other 
##                                                                                          1 
##                                                   geographic_boundaries eco_biogeo_proxies 
##                                                                                         21 
##                                   geographic_boundaries eco_biogeo_proxies adaptive_traits 
##                                                                                          3 
##                                  geographic_boundaries eco_biogeo_proxies management_units 
##                                                                                          3 
##                                             geographic_boundaries eco_biogeo_proxies other 
##                                                                                          2 
##                                                     geographic_boundaries management_units 
##                                                                                         23 
##                                                                geographic_boundaries other 
##                                                                                          9 
##                                                                           management_units 
##                                                                                        112 
##                                                                     management_units other 
##                                                                                          2 
##                                                                                      other 
##                                                                                         20

It is hard to group the above methods, so we will keep the original groups with n >=19 in the above list, and tag the combinations that appear few times as as “low_freq_combinations”.

Which groups have n>=19?

Check n for simplified methods:

## 
##         adaptive_traits management_units 
##                                       20 
##                       eco_biogeo_proxies 
##                                       37 
##                         genetic_clusters 
##                                      101 
##      genetic_clusters eco_biogeo_proxies 
##                                       20 
##   genetic_clusters geographic_boundaries 
##                                       71 
##                    geographic_boundaries 
##                                      254 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       21 
##   geographic_boundaries management_units 
##                                       23 
##                    low_freq_combinations 
##                                       85 
##                         management_units 
##                                      112 
##                                    other 
##                                       20

Another option is to highlight if genetic_cluster or geographic_boundaries were used at all, which are the main drivers. This will look like:

Table of equivalences:

Total number of taxa and taxa assessed more than once.

Records by country, including taxa assessed more than once (see below for details on this)

Did countries used kobo or tabular?

Records by taxonomic groups

Some taxa were assessed twice, for example to account for uncertainty on how to divide populations. This information is stored in variable multiassessment of the metadata (created by get_metadata()). An example of taxa with multiple assessments:

In total these are the number or records (assessment) done for both categories:

## 
##   multiassessment single_assessment 
##                73               721

The above numbers refer to the number or records, if what we want is to know how many taxa were analysed for each category, then:

Number of taxa with multiple submissions:

## [1] 35

Number of taxa with single submissions:

## [1] 721

To explore what kind of taxa countries assessed regardless of if they assessed them once or more, lets create a dataset keeping all single assessed taxa, plus only the first assessment for taxa assessed multiple times.

How many records?

## [1] 756

Of which countries and taxonomic groups are the taxa that were assessed more than once?

Now check taxa assessed excluding duplicates, i.e. the real number of taxa assessed. This will be used in downstream analyses

Sankey and alluvial fun

Note: The following plots in this section consider only one record of the taxa that were assessed more than once. That is a total of 756 taxa.

Which taxonomic groups are countries assessing?

Note on alluvial vs Sankey, taken from ggalluvial: An important feature of alluvial plots is the meaningfulness of the vertical axis: No gaps are inserted between the strata, so the total height of the plot reflects the cumulative quantity of the observations. The plots produced by {ggalluvial} conform to the “grammar of graphics” principles of {ggplot2}, and this prevents users from producing “free-floating” visualizations like the Sankey diagrams

Using alluvial:
## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"

Using ggsankey

Ne / Nc data across countries and taxa?

Using alluvial:
## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"

Using ggsankey option 1

Using ggsankey option 2

Taxonomic groups and IUCN status

Using alluvial:

## [1] "cr"           "dd"           "en"           "lc"           "not_assessed"
## [6] "nt"           "unknown"      "vu"
## [1] "brown2"     "brown2"     "darkorange" "darkorange" "darkorange"
## [6] "darkgreen"

Using ggsankey

Method to define populations

The following plots consider the whole dataset, ie including taxa that were assessed more than once (because they could have been analysed using different methods to define populations)

## [1] "#668cd1" "#668cd1" "#668cd1" "#668cd1" "#668cd1" "#45c097"

Same only country and method:

Sankey just becasue why not:

Indicator 1 (populations Ne > 500)

Remember population size data could be obtained by different means

Population size data may come from different methods for each population within a single taxon. For example, some populations can have Ne estimates, other Nc and others a range. Examples:

Also, for some taxa there may be population size data for some populations, but not all. Therefore indicator 1 would be computed with less populations than the total number of populations. Example (see pop3, 4, 13, 15):

We need to keep the former in mind for interpretation and discussion of how the indicator can change in future assessments if data becomes available for populations currently missing.

How many of the 2084 populations have Ne, Nc or range data?

Ne?

## [1] 128

Nc point?

## [1] 616

Nc range?

## [1] 1116

Has Ne values?

Ne values

How is Ne data distributed?

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
##       1.9      37.9     150.0   28394.3     537.0 3000000.0      1956
Boxplot of Ne values: Check outliers (Ne very high):

Boxplot filtering outliers (Ne)

Indicator 2 (proportion of populations within species which are maintained)

Indicator 2 is the he proportion of populations within species which are maintained. This can be estimated based on the n_extant_populations and n_extint_populations, as follows:

## [1] 1.0000000 0.5000000 0.2941176 1.0000000 0.3333333 1.0000000

Number of extant populations.

See the distribution of the number of extant populations:

Which taxa have more than 100 populations?

Exclude outliers (>200 populations)

How does the number of populations vary by country? (excluding outliers: >200 pops)

And by method to define populations? (excluding outliers: >200 pops)

Simplified method categories for easier visualization:

Number of populations by taxonomic group:

Taxonomic group and method:

Country and method:

Country and method, but with the US and Sweden in different scale

Number of populations by taxonomic group and range type:

Number of populations by taxonomic group and global IUCN:

ANOVAS on the number of extant populations

One-way ANOVA for the effect of the method to define populations on the number of extant pops, removing the extreme outlier (>1,000 pops)

# subset data without massive outlier
ind2_data_anova<- ind2_data %>% 
                         filter(n_extant_populations<1000)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       37 
##                         genetic_clusters 
##                                       98 
##      genetic_clusters eco_biogeo_proxies 
##                                       19 
##   genetic_clusters geographic_boundaries 
##                                       71 
##                    geographic_boundaries 
##                                      251 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       21 
##   geographic_boundaries management_units 
##                                       23 
##                    low_freq_combinations 
##                                       84 
##                         management_units 
##                                      108 
##                                    other 
##                                       14
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = n_extant_populations ~ defined_populations_simplified, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified Residuals
## Sum of Squares                         42563.5 1507793.9
## Deg. of Freedom                             11       763
## 
## Residual standard error: 44.45378
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                 Df  Sum Sq Mean Sq F value Pr(>F)  
## defined_populations_simplified  11   42564    3869   1.958 0.0298 *
## Residuals                      763 1507794    1976                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Same One-way ANOVA for the effect of the method to define populations on the number of extant pops, but removing outliers >200 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
                         filter(n_extant_populations<200)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       36 
##                         genetic_clusters 
##                                       98 
##      genetic_clusters eco_biogeo_proxies 
##                                       19 
##   genetic_clusters geographic_boundaries 
##                                       69 
##                    geographic_boundaries 
##                                      248 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       20 
##   geographic_boundaries management_units 
##                                       23 
##                    low_freq_combinations 
##                                       84 
##                         management_units 
##                                      108 
##                                    other 
##                                       14
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = n_extant_populations ~ defined_populations_simplified, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified Residuals
## Sum of Squares                         18929.5  358242.7
## Deg. of Freedom                             11       756
## 
## Residual standard error: 21.76846
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## defined_populations_simplified  11  18929  1720.9   3.632 4.98e-05 ***
## Residuals                      756 358243   473.9                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the country on the number of extant pops, removing outliers >200 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
                         filter(n_extant_populations<200)

# summary of n per variable
table(ind2_data_anova$country_assessment)
## 
##     australia       belgium        france         japan        mexico 
##            81            81            55            50            83 
##  south_africa        sweden united_states 
##           120           114           184
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ country_assessment, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = n_extant_populations ~ country_assessment, data = ind2_data_anova)
## 
## Terms:
##                 country_assessment Residuals
## Sum of Squares             33031.9  344140.2
## Deg. of Freedom                  7       760
## 
## Residual standard error: 21.27947
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## country_assessment   7  33032    4719   10.42 1.56e-12 ***
## Residuals          760 344140     453                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the taxonomic group on the number of extant pops, removing outliers >200 pops and taxonomic groups with too few data

# summary of n per variable
table(ind2_data$taxonomic_group)
## 
##     amphibian    angiosperm          bird     bryophyte          fish 
##            49           222            87             4            57 
##        fungus    gymnosperm  invertebrate        mammal         other 
##             1            17           133           134            18 
## pteridophytes       reptile 
##            12            60
# subset data 
ind2_data_anova<- ind2_data %>% 
                         filter(n_extant_populations<200) %>% 
                         filter(taxonomic_group %!in% c("fungus", "bryophyte", "other", "pteridophytes"))

# summary of n per variable
table(ind2_data_anova$taxonomic_group)
## 
##    amphibian   angiosperm         bird         fish   gymnosperm invertebrate 
##           48          217           86           56           16          128 
##       mammal      reptile 
##          129           53
# One way ANOVA method
res.anova.extant<-aov(n_extant_populations ~ taxonomic_group, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = n_extant_populations ~ taxonomic_group, data = ind2_data_anova)
## 
## Terms:
##                 taxonomic_group Residuals
## Sum of Squares          17549.4  343600.3
## Deg. of Freedom               7       725
## 
## Residual standard error: 21.76997
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## taxonomic_group   7  17549  2507.1    5.29 6.53e-06 ***
## Residuals       725 343600   473.9                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?

## Call:
##    aov(formula = n_extant_populations ~ defined_populations_simplified * 
##     country_assessment, data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        18929.47           22866.55
## Deg. of Freedom                             11                  7
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                           25164.77 310211.34
## Deg. of Freedom                                                43       706
## 
## Residual standard error: 20.9617
## 34 out of 96 effects not estimable
## Estimated effects may be unbalanced
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                     11  18929    1721   3.916
## country_assessment                                  7  22867    3267   7.434
## defined_populations_simplified:country_assessment  43  25165     585   1.332
## Residuals                                         706 310211     439        
##                                                     Pr(>F)    
## defined_populations_simplified                    1.57e-05 ***
## country_assessment                                1.21e-08 ***
## defined_populations_simplified:country_assessment   0.0793 .  
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n

# variables with enough n
enough_n<-ind2_data %>%
              group_by(country_assessment, defined_populations_simplified) %>% 
              summarise(n=n()) %>% 
              filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>% 
                  filter(n_extant_populations<200) %>% 
                                # this gives the country
                  filter(country_assessment==unique(enough_n$country_assessment)[1] & 
                                 #this gives the methods for that country (the last [[1]] is to get the results out of a list)
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]]  | 
                                
                  # the same for rest of countries. Notice the use of & for methods within country and | to change to other country
                  country_assessment==unique(enough_n$country_assessment)[2] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] | 
                    
                  country_assessment==unique(enough_n$country_assessment)[3] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
                    
                  country_assessment==unique(enough_n$country_assessment)[4] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] | 
                    
                  country_assessment==unique(enough_n$country_assessment)[5] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] | 
                    
                  country_assessment==unique(enough_n$country_assessment)[6] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] | 
                    
                  country_assessment==unique(enough_n$country_assessment)[7] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] | 
                    
                  country_assessment==unique(enough_n$country_assessment)[8] & 
                  defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])

# summary of n per variable
ind2_data_anova %>%
  group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(n_extant_populations ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = n_extant_populations ~ defined_populations_simplified * 
##     country_assessment, data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        21480.80           19226.60
## Deg. of Freedom                              8                  5
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                             834.45 234646.92
## Deg. of Freedom                                                 3       531
## 
## Residual standard error: 21.02133
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                      8  21481    2685   6.076
## country_assessment                                  5  19227    3845   8.702
## defined_populations_simplified:country_assessment   3    834     278   0.629
## Residuals                                         531 234647     442        
##                                                     Pr(>F)    
## defined_populations_simplified                    1.69e-07 ***
## country_assessment                                6.07e-08 ***
## defined_populations_simplified:country_assessment    0.596    
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

Number of extinct populations.

See the distribution of the number of extinct populations:

Exclude outliers (>200 populations)

How does the number of populations vary by country? (excluding outliers: >200 pops)

by method to define populations? (excluding outliers: >200 pops) with simplified method categories for easier visualization:

Number of populations by taxonomic group:

Taxonomic group and method:

Country and method:

Number of populations by taxonomic group and range type:

Number of populations by taxonomic group and global IUCN:

Numberof extant vs extinct number of populations

By method

By method. Sweden and US separately because they have too many pops. By risk status, zooming in to fewer n of pops. Sweden and US separately because they have too many pops.

Distribution of NA in indicator 2

We have NA because in some cases the number of extinct populations is unknown, therefore the above operation cannot be computed.

Total records with NA in extant populations:

## [1] 18
Which are?

Total taxa with NA in extinct populations:

## [1] 347

Do taxa with NA for extant also have NA for extinct?

##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE

So out of the 794, we have 347 records with NA in n_extinct and 18 records with NA in n_extant. Of them, 18 have NA in both n_extant and n_extinct.

QUESTION: should we manually check that NA are correct in both extinct or extant pops? (the cleaning script only chekcs for 0, not NAs)

So in total there are 347 records where there are NA in either n_extant or n_extinct, which is 43.7% of total number of records. Therefore when estimating indicator 2… QUESTION: What should we do: A) we can’t estimate indicator 2 in those species, or B) we assume n_extinct = NA = 0, and therefore indicator 2 = 1.

Plots!

By country

By taxonomic group

By method to define pops By method to define pops and country

By taxonomic group and country

Distribution of multiassessments and single assesments

Some taxa were assessed more than once to account for, for example, different ways in how to delimit populations. Create a subset of them, excluding those records with missing data in indicator2 (due to missing data in n_pops).

In total there are 73 multiassessed records, of 35 taxa. Notice that this can include missing data in the number of populations, hence not allowing to estimate indicator 2.

To be able to visualize the missing data, the following plot changes NA to -1. Variation in the number of extant populations by assessment

Same plot, but excluding Bombus terricola’s massive variation:

Now for extinct populations (NA transformed as -1 for visualization purposes):

See you later, Bombus terricola

This is how much the values of indicator2 vary within mutliassessed taxa (taxa names with no shown values mean they have missing data in the number of populations and hence indicator 2 can’t be estimated):

For exploratory purposes, unless otherwise stated differently, the analyses below will use a subset of the data including only taxa assessed a single time, plus the first record of those assessed multiple times.

Indicator 2 values distribution

Remember, for exploratory purposes, unless otherwise stated differently, the analyses below will use a subset of the data including only taxa assessed a single time, plus the first record of those assessed multiple times.

For the taxa that do have data, this is how the values of indicator2 are distributed:

Visualizing by country

Visualizing by taxonomic group:

Same boxplot

Zoom in to invertebrates by country:

Visualizing by IUCN:

Visualizing by range type:

Visualizing by rarity:

By population method

Same, boxplot version:

Facet by country

Facet by country and IUCN

Indicator 2 within single countries

Value of indicator 2 within a single country, by method to define populations, taxonomic group and iucn. Including only 1 assessment per multiassessed taxa.

Does the number of populations affects the value of indicator 2? (excluding outliers: >200 pops) and incluidng all records of multiassessed taxa:

Same, colouring by country

By global IUCN risk

By population definition method:

Run model (first removing missing data)

## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       25 
##                         genetic_clusters 
##                                       44 
##      genetic_clusters eco_biogeo_proxies 
##                                       10 
##   genetic_clusters geographic_boundaries 
##                                       39 
##                    geographic_boundaries 
##                                      139 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       15 
##   geographic_boundaries management_units 
##                                       16 
##                    low_freq_combinations 
##                                       63 
##                         management_units 
##                                       38 
##                                    other 
##                                        7
## 
## Call:  glm(formula = ind2_data_wo_missing$indicator2 ~ ind2_data_wo_missing$n_extant_populations + 
##     ind2_data_wo_missing$defined_populations_simplified + ind2_data_wo_missing$n_extant_populations * 
##     ind2_data_wo_missing$defined_populations_simplified, family = "quasibinomial")
## 
## Coefficients:
##                                                                                                                           (Intercept)  
##                                                                                                                              2.438899  
##                                                                                             ind2_data_wo_missing$n_extant_populations  
##                                                                                                                              0.039827  
##                                                                 ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies  
##                                                                                                                             -1.150927  
##                                                                   ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters  
##                                                                                                                              0.348388  
##                                                ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies  
##                                                                                                                              1.404638  
##                                             ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries  
##                                                                                                                             -0.682750  
##                                                              ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries  
##                                                                                                                             -1.139870  
##                                              ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits  
##                                                                                                                              0.576490  
##                                           ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies  
##                                                                                                                             -0.996594  
##                                             ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units  
##                                                                                                                             -0.731998  
##                                                              ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations  
##                                                                                                                             -0.620621  
##                                                                   ind2_data_wo_missing$defined_populations_simplifiedmanagement_units  
##                                                                                                                             -1.992630  
##                                                                              ind2_data_wo_missing$defined_populations_simplifiedother  
##                                                                                                                             -2.509699  
##                       ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies  
##                                                                                                                             -0.035179  
##                         ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters  
##                                                                                                                             -0.097087  
##      ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies  
##                                                                                                                             -0.172294  
##   ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries  
##                                                                                                                             -0.045113  
##                    ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries  
##                                                                                                                             -0.042280  
##    ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits  
##                                                                                                                             -0.031073  
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies  
##                                                                                                                             -0.042619  
##   ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units  
##                                                                                                                              0.006636  
##                    ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations  
##                                                                                                                             -0.048541  
##                         ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units  
##                                                                                                                             -0.033672  
##                                    ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother  
##                                                                                                                              0.655693  
## 
## Degrees of Freedom: 444 Total (i.e. Null);  421 Residual
## Null Deviance:       195.2 
## Residual Deviance: 165.5     AIC: NA
## 
## Call:
## glm(formula = ind2_data_wo_missing$indicator2 ~ ind2_data_wo_missing$n_extant_populations + 
##     ind2_data_wo_missing$defined_populations_simplified + ind2_data_wo_missing$n_extant_populations * 
##     ind2_data_wo_missing$defined_populations_simplified, family = "quasibinomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7551  -0.3135   0.3176   0.5652   0.9923  
## 
## Coefficients:
##                                                                                                                                        Estimate
## (Intercept)                                                                                                                            2.438899
## ind2_data_wo_missing$n_extant_populations                                                                                              0.039827
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                                                                 -1.150927
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                                                                    0.348388
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies                                                 1.404638
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries                                             -0.682750
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                                                              -1.139870
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits                                               0.576490
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies                                           -0.996594
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units                                             -0.731998
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                                                              -0.620621
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                                                                   -1.992630
## ind2_data_wo_missing$defined_populations_simplifiedother                                                                              -2.509699
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                       -0.035179
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                         -0.097087
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies      -0.172294
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries   -0.045113
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                    -0.042280
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits    -0.031073
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies -0.042619
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units    0.006636
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                    -0.048541
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                         -0.033672
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother                                     0.655693
##                                                                                                                                       Std. Error
## (Intercept)                                                                                                                             0.768536
## ind2_data_wo_missing$n_extant_populations                                                                                               0.120075
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                                                                   0.848557
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                                                                     0.925289
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies                                                  1.497357
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries                                               0.823093
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                                                                0.782282
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits                                                0.979936
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies                                             0.884590
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units                                               0.989659
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                                                                0.806642
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                                                                     0.815451
## ind2_data_wo_missing$defined_populations_simplifiedother                                                                                1.222196
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                         0.120242
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                           0.169784
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies        0.132777
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries     0.120112
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                      0.120168
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits      0.125158
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies   0.120143
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units     0.169104
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                      0.120281
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                           0.123390
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother                                      0.604276
##                                                                                                                                       t value
## (Intercept)                                                                                                                             3.173
## ind2_data_wo_missing$n_extant_populations                                                                                               0.332
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                                                                  -1.356
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                                                                     0.377
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies                                                  0.938
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries                                              -0.829
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                                                               -1.457
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits                                                0.588
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies                                            -1.127
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units                                              -0.740
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                                                               -0.769
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                                                                    -2.444
## ind2_data_wo_missing$defined_populations_simplifiedother                                                                               -2.053
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                        -0.293
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                          -0.572
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies       -1.298
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries    -0.376
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                     -0.352
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits     -0.248
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies  -0.355
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units     0.039
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                     -0.404
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                          -0.273
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother                                      1.085
##                                                                                                                                       Pr(>|t|)
## (Intercept)                                                                                                                            0.00162
## ind2_data_wo_missing$n_extant_populations                                                                                              0.74029
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                                                                  0.17572
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                                                                    0.70672
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies                                                 0.34874
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries                                              0.40730
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                                                               0.14583
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits                                               0.55665
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies                                            0.26055
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units                                              0.45993
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                                                               0.44209
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                                                                    0.01495
## ind2_data_wo_missing$defined_populations_simplifiedother                                                                               0.04065
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                        0.77000
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                          0.56775
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies       0.19513
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries    0.70741
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                     0.72514
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits     0.80405
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies  0.72297
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units    0.96872
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                     0.68674
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                          0.78507
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother                                     0.27850
##                                                                                                                                         
## (Intercept)                                                                                                                           **
## ind2_data_wo_missing$n_extant_populations                                                                                               
## ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                                                                   
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                                                                     
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies                                                  
## ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries                                               
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                                                                
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits                                                
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies                                             
## ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units                                               
## ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                                                                
## ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                                                                   * 
## ind2_data_wo_missing$defined_populations_simplifiedother                                                                              * 
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedeco_biogeo_proxies                         
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters                           
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters eco_biogeo_proxies        
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgenetic_clusters geographic_boundaries     
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries                      
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries adaptive_traits      
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries eco_biogeo_proxies   
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedgeographic_boundaries management_units     
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedlow_freq_combinations                      
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedmanagement_units                           
## ind2_data_wo_missing$n_extant_populations:ind2_data_wo_missing$defined_populations_simplifiedother                                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasibinomial family taken to be 0.3948779)
## 
##     Null deviance: 195.17  on 444  degrees of freedom
## Residual deviance: 165.50  on 421  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Put side by side the plots of number of populations and indicator 2 range (excluding >200 pops outlieres): Add the scatter plot of indicator2 and extant pops as a third pannel

ANOVAS on the number of extant populations

One-way ANOVA for the effect of the method to define populations on indicator 2, removing the extreme outlier (>1,000 pops)

# subset data without massive outlier
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       25 
##                         genetic_clusters 
##                                       44 
##      genetic_clusters eco_biogeo_proxies 
##                                       10 
##   genetic_clusters geographic_boundaries 
##                                       39 
##                    geographic_boundaries 
##                                      141 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       15 
##   geographic_boundaries management_units 
##                                       16 
##                    low_freq_combinations 
##                                       63 
##                         management_units 
##                                       38 
##                                    other 
##                                        7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified, data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified Residuals
## Sum of Squares                        3.296467 24.662583
## Deg. of Freedom                             11       435
## 
## Residual standard error: 0.2381084
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## defined_populations_simplified  11  3.296  0.2997   5.286 7.34e-08 ***
## Residuals                      435 24.663  0.0567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Same One-way ANOVA for the effect of the method to define populations on indicator 2, but removing outliers >200 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<200)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       25 
##                         genetic_clusters 
##                                       44 
##      genetic_clusters eco_biogeo_proxies 
##                                       10 
##   genetic_clusters geographic_boundaries 
##                                       39 
##                    geographic_boundaries 
##                                      141 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       15 
##   geographic_boundaries management_units 
##                                       16 
##                    low_freq_combinations 
##                                       63 
##                         management_units 
##                                       38 
##                                    other 
##                                        7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified, data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified Residuals
## Sum of Squares                        3.296467 24.662583
## Deg. of Freedom                             11       435
## 
## Residual standard error: 0.2381084
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## defined_populations_simplified  11  3.296  0.2997   5.286 7.34e-08 ***
## Residuals                      435 24.663  0.0567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.3137825”,“2”:“-0.53362470”,“3”:“-0.09394021”,“4”:“2.282910e-04”,“rn”:“management_units-adaptive_traits management_units”},{“1”:“-0.1528869”,“2”:“-0.28799841”,“3”:“-0.01777543”,“4”:“1.204372e-02”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.3142083”,“2”:“-0.48748139”,“3”:“-0.14093522”,“4”:“3.442524e-07”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.2125950”,“2”:“-0.39094113”,“3”:“-0.03424896”,“4”:“5.807306e-03”,“rn”:“management_units-genetic_clusters geographic_boundaries”},{“1”:“0.1762672”,“2”:“0.01895216”,“3”:“0.33358220”,“4”:“1.369938e-02”,“rn”:“geographic_boundaries adaptive_traits-geographic_boundaries”},{“1”:“-0.1613214”,“2”:“-0.30433174”,“3”:“-0.01831103”,“4”:“1.254869e-02”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.3375886”,“2”:“-0.52868138”,“3”:“-0.14649575”,“4”:“8.116474e-07”,“rn”:“management_units-geographic_boundaries adaptive_traits”},{“1”:“-0.2492383”,“2”:“-0.48241624”,“3”:“-0.01606041”,“4”:“2.442980e-02”,“rn”:“management_units-geographic_boundaries management_units”},{“1”:“-0.2270969”,“2”:“-0.38780613”,“3”:“-0.06638758”,“4”:“2.822933e-04”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}

One-way ANOVA for the effect of the country on indicator 2, removing outliers >200 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<200)

# summary of n per variable
table(ind2_data_anova$country_assessment)
## 
##     australia       belgium        france         japan        mexico 
##            26            20            27            50            23 
##  south_africa        sweden united_states 
##            90            72           139
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ country_assessment, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ country_assessment, data = ind2_data_anova)
## 
## Terms:
##                 country_assessment Residuals
## Sum of Squares            8.116879 19.842171
## Deg. of Freedom                  7       439
## 
## Residual standard error: 0.2125995
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                     Df Sum Sq Mean Sq F value Pr(>F)    
## country_assessment   7  8.117  1.1596   25.66 <2e-16 ***
## Residuals          439 19.842  0.0452                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.4704909”,“2”:“-0.66307208”,“3”:“-0.27790972”,“4”:“0.000000e+00”,“rn”:“belgium-australia”},{“1”:“-0.2576349”,“2”:“-0.40578324”,“3”:“-0.10948661”,“4”:“5.169073e-06”,“rn”:“sweden-australia”},{“1”:“0.4095050”,“2”:“0.21848066”,“3”:“0.60052928”,“4”:“5.101817e-09”,“rn”:“france-belgium”},{“1”:“0.5158197”,“2”:“0.34450853”,“3”:“0.68713082”,“4”:“0.000000e+00”,“rn”:“japan-belgium”},{“1”:“0.4969288”,“2”:“0.29896220”,“3”:“0.69489539”,“4”:“0.000000e+00”,“rn”:“mexico-belgium”},{“1”:“0.5230411”,“2”:“0.36297601”,“3”:“0.68310624”,“4”:“0.000000e+00”,“rn”:“south_africa-belgium”},{“1”:“0.2128560”,“2”:“0.04919344”,“3”:“0.37651850”,“4”:“2.203044e-03”,“rn”:“sweden-belgium”},{“1”:“0.3867284”,“2”:“0.23187787”,“3”:“0.54157897”,“4”:“0.000000e+00”,“rn”:“united_states-belgium”},{“1”:“-0.1966490”,“2”:“-0.34276778”,“3”:“-0.05053021”,“4”:“1.273976e-03”,“rn”:“sweden-france”},{“1”:“-0.3029637”,“2”:“-0.42216068”,“3”:“-0.18376672”,“4”:“0.000000e+00”,“rn”:“sweden-japan”},{“1”:“-0.1290913”,“2”:“-0.23586761”,“3”:“-0.02231490”,“4”:“6.291789e-03”,“rn”:“united_states-japan”},{“1”:“-0.2840728”,“2”:“-0.43915726”,“3”:“-0.12898838”,“4”:“1.178200e-06”,“rn”:“sweden-mexico”},{“1”:“-0.3101852”,“2”:“-0.41256314”,“3”:“-0.20780716”,“4”:“0.000000e+00”,“rn”:“sweden-south_africa”},{“1”:“-0.1363127”,“2”:“-0.22391706”,“3”:“-0.04870835”,“4”:“7.872918e-05”,“rn”:“united_states-south_africa”},{“1”:“0.1738724”,“2”:“0.07985593”,“3”:“0.26788897”,“4”:“8.825632e-07”,“rn”:“united_states-sweden”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}

One-way ANOVA for the effect of the taxonomic group on indicator 2, removing outliers >200 pops and taxonomic groups with too few data

# summary of n per variable
table(ind2_data$taxonomic_group)
## 
##     amphibian    angiosperm          bird     bryophyte          fish 
##            49           222            87             4            57 
##        fungus    gymnosperm  invertebrate        mammal         other 
##             1            17           133           134            18 
## pteridophytes       reptile 
##            12            60
# subset data 
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<200) %>% 
  filter(taxonomic_group %!in% c("fungus", "bryophyte", "other", "pteridophytes"))

# summary of n per variable
table(ind2_data_anova$taxonomic_group)
## 
##    amphibian   angiosperm         bird         fish   gymnosperm invertebrate 
##           36          132           43           40            9           69 
##       mammal      reptile 
##           64           31
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ taxonomic_group, data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ taxonomic_group, data = ind2_data_anova)
## 
## Terms:
##                 taxonomic_group Residuals
## Sum of Squares          3.43812  23.70842
## Deg. of Freedom               7       416
## 
## Residual standard error: 0.2387287
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## taxonomic_group   7  3.438  0.4912   8.618 6.96e-10 ***
## Residuals       416 23.708  0.0570                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.2102735”,“2”:“-0.35979862”,“3”:“-0.0607483793”,“4”:“5.984864e-04”,“rn”:“invertebrate-amphibian”},{“1”:“-0.1908065”,“2”:“-0.29884585”,“3”:“-0.0827672144”,“4”:“3.441991e-06”,“rn”:“invertebrate-angiosperm”},{“1”:“-0.2025492”,“2”:“-0.34385030”,“3”:“-0.0612480996”,“4”:“4.208169e-04”,“rn”:“invertebrate-bird”},{“1”:“-0.1449407”,“2”:“-0.28946938”,“3”:“-0.0004120327”,“4”:“4.876394e-02”,“rn”:“invertebrate-fish”},{“1”:“-0.3326302”,“2”:“-0.59037907”,“3”:“-0.0748812894”,“4”:“2.490181e-03”,“rn”:“invertebrate-gymnosperm”},{“1”:“0.2887679”,“2”:“0.16255422”,“3”:“0.4149816709”,“4”:“3.237879e-10”,“rn”:“mammal-invertebrate”},{“1”:“0.2412486”,“2”:“0.08399883”,“3”:“0.3984982978”,“4”:“1.079755e-04”,“rn”:“reptile-invertebrate”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}

Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?

<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“country_assessment”],“name”:[1],“type”:[“chr”],“align”:[“left”]},{“label”:[“defined_populations_simplified”],“name”:[2],“type”:[“chr”],“align”:[“left”]},{“label”:[“n”],“name”:[3],“type”:[“int”],“align”:[“right”]}],“data”:[{“1”:“australia”,“2”:“genetic_clusters”,“3”:“5”},{“1”:“australia”,“2”:“genetic_clusters geographic_boundaries”,“3”:“2”},{“1”:“australia”,“2”:“geographic_boundaries”,“3”:“11”},{“1”:“australia”,“2”:“geographic_boundaries management_units”,“3”:“5”},{“1”:“australia”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“australia”,“2”:“management_units”,“3”:“1”},{“1”:“belgium”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“belgium”,“2”:“management_units”,“3”:“18”},{“1”:“france”,“2”:“genetic_clusters”,“3”:“1”},{“1”:“france”,“2”:“genetic_clusters eco_biogeo_proxies”,“3”:“2”},{“1”:“france”,“2”:“genetic_clusters geographic_boundaries”,“3”:“1”},{“1”:“france”,“2”:“geographic_boundaries”,“3”:“3”},{“1”:“france”,“2”:“geographic_boundaries eco_biogeo_proxies”,“3”:“3”},{“1”:“france”,“2”:“low_freq_combinations”,“3”:“15”},{“1”:“france”,“2”:“management_units”,“3”:“2”},{“1”:“japan”,“2”:“adaptive_traits management_units”,“3”:“19”},{“1”:“japan”,“2”:“geographic_boundaries”,“3”:“1”},{“1”:“japan”,“2”:“geographic_boundaries adaptive_traits”,“3”:“18”},{“1”:“japan”,“2”:“low_freq_combinations”,“3”:“12”},{“1”:“mexico”,“2”:“genetic_clusters”,“3”:“4”},{“1”:“mexico”,“2”:“genetic_clusters geographic_boundaries”,“3”:“3”},{“1”:“mexico”,“2”:“geographic_boundaries”,“3”:“4”},{“1”:“mexico”,“2”:“geographic_boundaries adaptive_traits”,“3”:“9”},{“1”:“mexico”,“2”:“low_freq_combinations”,“3”:“2”},{“1”:“mexico”,“2”:“other”,“3”:“1”},{“1”:“south_africa”,“2”:“eco_biogeo_proxies”,“3”:“3”},{“1”:“south_africa”,“2”:“genetic_clusters”,“3”:“25”},{“1”:“south_africa”,“2”:“genetic_clusters geographic_boundaries”,“3”:“18”},{“1”:“south_africa”,“2”:“geographic_boundaries”,“3”:“30”},{“1”:“south_africa”,“2”:“geographic_boundaries management_units”,“3”:“2”},{“1”:“south_africa”,“2”:“low_freq_combinations”,“3”:“5”},{“1”:“south_africa”,“2”:“management_units”,“3”:“6”},{“1”:“south_africa”,“2”:“other”,“3”:“1”},{“1”:“sweden”,“2”:“eco_biogeo_proxies”,“3”:“2”},{“1”:“sweden”,“2”:“genetic_clusters”,“3”:“3”},{“1”:“sweden”,“2”:“genetic_clusters geographic_boundaries”,“3”:“8”},{“1”:“sweden”,“2”:“geographic_boundaries”,“3”:“46”},{“1”:“sweden”,“2”:“geographic_boundaries adaptive_traits”,“3”:“3”},{“1”:“sweden”,“2”:“geographic_boundaries management_units”,“3”:“3”},{“1”:“sweden”,“2”:“low_freq_combinations”,“3”:“6”},{“1”:“sweden”,“2”:“management_units”,“3”:“1”},{“1”:“united_states”,“2”:“eco_biogeo_proxies”,“3”:“20”},{“1”:“united_states”,“2”:“genetic_clusters”,“3”:“6”},{“1”:“united_states”,“2”:“genetic_clusters eco_biogeo_proxies”,“3”:“8”},{“1”:“united_states”,“2”:“genetic_clusters geographic_boundaries”,“3”:“7”},{“1”:“united_states”,“2”:“geographic_boundaries”,“3”:“46”},{“1”:“united_states”,“2”:“geographic_boundaries eco_biogeo_proxies”,“3”:“12”},{“1”:“united_states”,“2”:“geographic_boundaries management_units”,“3”:“6”},{“1”:“united_states”,“2”:“low_freq_combinations”,“3”:“19”},{“1”:“united_states”,“2”:“management_units”,“3”:“10”},{“1”:“united_states”,“2”:“other”,“3”:“5”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}
## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified * country_assessment, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        3.296467           5.315673
## Deg. of Freedom                             11                  7
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                           2.036565 17.310345
## Deg. of Freedom                                                32       396
## 
## Residual standard error: 0.2090765
## 45 out of 96 effects not estimable
## Estimated effects may be unbalanced
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                     11  3.296  0.2997   6.856
## country_assessment                                  7  5.316  0.7594  17.372
## defined_populations_simplified:country_assessment  32  2.037  0.0636   1.456
## Residuals                                         396 17.310  0.0437        
##                                                     Pr(>F)    
## defined_populations_simplified                    1.33e-10 ***
## country_assessment                                 < 2e-16 ***
## defined_populations_simplified:country_assessment   0.0554 .  
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.3137825”,“2”:“-0.50692434”,“3”:“-0.12064057”,“4”:“1.007108e-05”,“rn”:“management_units-adaptive_traits management_units”},{“1”:“-0.1889437”,“2”:“-0.36596116”,“3”:“-0.01192624”,“4”:“2.486190e-02”,“rn”:“management_units-eco_biogeo_proxies”},{“1”:“-0.1528869”,“2”:“-0.27158880”,“3”:“-0.03418504”,“4”:“1.683367e-03”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.3142083”,“2”:“-0.46643697”,“3”:“-0.16197965”,“4”:“2.759523e-09”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.2689522”,“2”:“-0.51325948”,“3”:“-0.02464487”,“4”:“1.716874e-02”,“rn”:“management_units-genetic_clusters eco_biogeo_proxies”},{“1”:“-0.2125950”,“2”:“-0.36928057”,“3”:“-0.05590951”,“4”:“6.482061e-04”,“rn”:“management_units-genetic_clusters geographic_boundaries”},{“1”:“0.1762672”,“2”:“0.03805844”,“3”:“0.31447592”,“4”:“1.994144e-03”,“rn”:“geographic_boundaries adaptive_traits-geographic_boundaries”},{“1”:“-0.1613214”,“2”:“-0.28696279”,“3”:“-0.03567998”,“4”:“1.776752e-03”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.3375886”,“2”:“-0.50547270”,“3”:“-0.16970443”,“4”:“8.116088e-09”,“rn”:“management_units-geographic_boundaries adaptive_traits”},{“1”:“-0.2492383”,“2”:“-0.45409623”,“3”:“-0.04438042”,“4”:“4.284597e-03”,“rn”:“management_units-geographic_boundaries management_units”},{“1”:“-0.2270969”,“2”:“-0.36828761”,“3”:“-0.08590611”,“4”:“1.320299e-05”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}

Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n

# variables with enough n
enough_n<-ind2_data %>%
  group_by(country_assessment, defined_populations_simplified) %>% 
  summarise(n=n()) %>% 
  filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<200) %>% 
  # this gives the country
  filter(country_assessment==unique(enough_n$country_assessment)[1] & 
           #this gives the methods for that country (the last [[1]] is to get the results out of a list)
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]]  | 
           
           # the same for rest of countries. Notice the use of & for methods within country and | to change to other country
           country_assessment==unique(enough_n$country_assessment)[2] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[3] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
           
           country_assessment==unique(enough_n$country_assessment)[4] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[5] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[6] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[7] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[8] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])

# summary of n per variable
ind2_data_anova %>%
  group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified * country_assessment, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        4.245618           2.775843
## Deg. of Freedom                              8                  5
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                           0.489659 12.931021
## Deg. of Freedom                                                 3       300
## 
## Residual standard error: 0.2076136
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                      8  4.246  0.5307  12.312
## country_assessment                                  5  2.776  0.5552  12.880
## defined_populations_simplified:country_assessment   3  0.490  0.1632   3.787
## Residuals                                         300 12.931  0.0431        
##                                                     Pr(>F)    
## defined_populations_simplified                    3.00e-15 ***
## country_assessment                                2.34e-11 ***
## defined_populations_simplified:country_assessment   0.0108 *  
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):
<script data-pagedtable-source type="application/json">
{“columns”:[{“label”:[“”],“name”:[“rn”],“type”:[“”],“align”:[“left”]},{“label”:[“diff”],“name”:[1],“type”:[“dbl”],“align”:[“right”]},{“label”:[“lwr”],“name”:[2],“type”:[“dbl”],“align”:[“right”]},{“label”:[“upr”],“name”:[3],“type”:[“dbl”],“align”:[“right”]},{“label”:[“p adj”],“name”:[4],“type”:[“dbl”],“align”:[“right”]}],“data”:[{“1”:“-0.1596277”,“2”:“-0.31850793”,“3”:“-0.000747464”,“4”:“4.790991e-02”,“rn”:“geographic_boundaries-adaptive_traits management_units”},{“1”:“-0.5129239”,“2”:“-0.72629623”,“3”:“-0.299551590”,“4”:“2.533862e-11”,“rn”:“management_units-adaptive_traits management_units”},{“1”:“-0.4225076”,“2”:“-0.63326893”,“3”:“-0.211746370”,“4”:“4.663398e-08”,“rn”:“management_units-eco_biogeo_proxies”},{“1”:“-0.1589618”,“2”:“-0.28334613”,“3”:“-0.034577379”,“4”:“2.626670e-03”,“rn”:“geographic_boundaries-genetic_clusters”},{“1”:“-0.5122580”,“2”:“-0.70135131”,“3”:“-0.323164626”,“4”:“1.064038e-12”,“rn”:“management_units-genetic_clusters”},{“1”:“-0.4584124”,“2”:“-0.65732106”,“3”:“-0.259503748”,“4”:“1.749125e-10”,“rn”:“management_units-genetic_clusters geographic_boundaries”},{“1”:“0.1798768”,“2”:“0.01717028”,“3”:“0.342583295”,“4”:“1.802999e-02”,“rn”:“geographic_boundaries adaptive_traits-geographic_boundaries”},{“1”:“-0.3532962”,“2”:“-0.51600272”,“3”:“-0.190589705”,“4”:“2.232140e-09”,“rn”:“management_units-geographic_boundaries”},{“1”:“-0.5331730”,“2”:“-0.74940951”,“3”:“-0.316936491”,“4”:“7.973733e-12”,“rn”:“management_units-geographic_boundaries adaptive_traits”},{“1”:“-0.3618975”,“2”:“-0.60365722”,“3”:“-0.120137689”,“4”:“1.513420e-04”,“rn”:“management_units-geographic_boundaries eco_biogeo_proxies”},{“1”:“-0.4056763”,“2”:“-0.59476968”,“3”:“-0.216582993”,“4”:“3.626228e-09”,“rn”:“management_units-low_freq_combinations”}],“options”:{“columns”:{“min”:{},“max”:[10]},“rows”:{“min”:[10],“max”:[10]},“pages”:{}}}

ANOVAs on indicator 2

One-way ANOVA for the effect of the method to define populations on indicator 2, removing the extreme outlier (>1,000 pops)

# subset data without massive outlier
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       25 
##                         genetic_clusters 
##                                       44 
##      genetic_clusters eco_biogeo_proxies 
##                                       10 
##   genetic_clusters geographic_boundaries 
##                                       39 
##                    geographic_boundaries 
##                                      141 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       15 
##   geographic_boundaries management_units 
##                                       16 
##                    low_freq_combinations 
##                                       63 
##                         management_units 
##                                       38 
##                                    other 
##                                        7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
summary(res.anova.extant)
##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## defined_populations_simplified  11  3.296  0.2997   5.286 7.34e-08 ***
## Residuals                      435 24.663  0.0567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Same One-way ANOVA for the effect of the method to define populations on indicator 2, but removing outliers >1000 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000)

# summary of n per variable
table(ind2_data_anova$defined_populations_simplified)
## 
##         adaptive_traits management_units 
##                                       19 
##                       eco_biogeo_proxies 
##                                       25 
##                         genetic_clusters 
##                                       44 
##      genetic_clusters eco_biogeo_proxies 
##                                       10 
##   genetic_clusters geographic_boundaries 
##                                       39 
##                    geographic_boundaries 
##                                      141 
##    geographic_boundaries adaptive_traits 
##                                       30 
## geographic_boundaries eco_biogeo_proxies 
##                                       15 
##   geographic_boundaries management_units 
##                                       16 
##                    low_freq_combinations 
##                                       63 
##                         management_units 
##                                       38 
##                                    other 
##                                        7
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified, data=ind2_data_anova)
summary(res.anova.extant)
##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## defined_populations_simplified  11  3.296  0.2997   5.286 7.34e-08 ***
## Residuals                      435 24.663  0.0567                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the country on indicator 2, removing outliers >1000 pops

# subset data without outliers
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000)

# summary of n per variable
table(ind2_data_anova$country_assessment)
## 
##     australia       belgium        france         japan        mexico 
##            26            20            27            50            23 
##  south_africa        sweden united_states 
##            90            72           139
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ country_assessment, data=ind2_data_anova)
summary(res.anova.extant)
##                     Df Sum Sq Mean Sq F value Pr(>F)    
## country_assessment   7  8.117  1.1596   25.66 <2e-16 ***
## Residuals          439 19.842  0.0452                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the taxonomic group on indicator 2, removing outliers >1000 pops and taxonomic groups with too few data

# summary of n per variable
table(ind2_data$taxonomic_group)
## 
##     amphibian    angiosperm          bird     bryophyte          fish 
##            49           222            87             4            57 
##        fungus    gymnosperm  invertebrate        mammal         other 
##             1            17           133           134            18 
## pteridophytes       reptile 
##            12            60
# subset data 
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000) %>% 
  filter(taxonomic_group %!in% c("fungus", "bryophyte", "other"))

# summary of n per variable
table(ind2_data_anova$taxonomic_group)
## 
##     amphibian    angiosperm          bird          fish    gymnosperm 
##            36           132            43            40             9 
##  invertebrate        mammal pteridophytes       reptile 
##            69            64             8            31
# One way ANOVA
res.anova.extant<-aov(indicator2 ~ taxonomic_group, data=ind2_data_anova)
summary(res.anova.extant)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## taxonomic_group   8  3.438  0.4298   7.528 2.06e-09 ***
## Residuals       423 24.149  0.0571                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the global IUCN on indicator 2, removing outliers >1000 pops and taxonomic groups with too few data

# summary of n per variable
table(ind2_data$global_IUCN)
## 
##           cr           dd           en           lc not_assessed           nt 
##           54           13           83          224          258           70 
##      unknown           vu 
##            5           87
# subset data 
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<1000) %>% 
  filter(global_IUCN %!in% c("dd", "unknown"))

# summary of n per variable
table(ind2_data_anova$global_IUCN)
## 
##           cr           en           lc not_assessed           nt           vu 
##           32           51          103          154           40           55
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ global_IUCN, data=ind2_data_anova)
summary(res.anova.extant)
##              Df Sum Sq Mean Sq F value Pr(>F)
## global_IUCN   5  0.092 0.01834    0.29  0.918
## Residuals   429 27.099 0.06317
Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

One-way ANOVA for the effect of the species range on indicator 2

# summary of n per variable
table(ind2_data$species_range)
## 
##   restricted      unknown wide_ranging 
##          439           16          339
# subset data 
ind2_data_anova<- ind2_data %>% 
                  filter(species_range != "unknown")

# summary of n per variable
table(ind2_data_anova$species_range)
## 
##   restricted wide_ranging 
##          439          339
# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ species_range, data=ind2_data_anova)
summary(res.anova.extant)
##                Df Sum Sq Mean Sq F value Pr(>F)  
## species_range   1  0.205 0.20508   3.267 0.0714 .
## Residuals     439 27.558 0.06278                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 337 observations deleted due to missingness

One-way ANOVA for the effect of the rarity on indicator 2

# summary of n per variable

table(ind2_data_firstmulti$rarity)
## 
##     not_rare rare_natural  rare_recent 
##          305          308          143
# subset data 
ind2_data_anova<- ind2_data_firstmulti


# One way ANOVA method
res.anova.extant<-aov(indicator2 ~ species_range, data=ind2_data_anova)
summary(res.anova.extant)
##                Df Sum Sq Mean Sq F value Pr(>F)  
## species_range   2  0.493 0.24664   3.918 0.0206 *
## Residuals     417 26.251 0.06295                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 336 observations deleted due to missingness

Two-way ANOVA with interaction effect of the method to define populations and the country, removing outliers >200 pops Question: is interaction he correct model? or should it be additive?

## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified * country_assessment, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        3.296467           5.315673
## Deg. of Freedom                             11                  7
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                           2.036565 17.310345
## Deg. of Freedom                                                32       396
## 
## Residual standard error: 0.2090765
## 45 out of 96 effects not estimable
## Estimated effects may be unbalanced
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                     11  3.296  0.2997   6.856
## country_assessment                                  7  5.316  0.7594  17.372
## defined_populations_simplified:country_assessment  32  2.037  0.0636   1.456
## Residuals                                         396 17.310  0.0437        
##                                                     Pr(>F)    
## defined_populations_simplified                    1.33e-10 ***
## country_assessment                                 < 2e-16 ***
## defined_populations_simplified:country_assessment   0.0554 .  
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

Two-way ANOVA with interaction effect of the method to define populations and the country, but keeping only groups with enough n

# variables with enough n
enough_n<-ind2_data %>%
  group_by(country_assessment, defined_populations_simplified) %>% 
  summarise(n=n()) %>% 
  filter(n>=15)
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# subset data without outliers and with enough n
ind2_data_anova<- ind2_data %>% 
  filter(indicator2<200) %>% 
  # this gives the country
  filter(country_assessment==unique(enough_n$country_assessment)[1] & 
           #this gives the methods for that country (the last [[1]] is to get the results out of a list)
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[1], 2][[1]]  | 
           
           # the same for rest of countries. Notice the use of & for methods within country and | to change to other country
           country_assessment==unique(enough_n$country_assessment)[2] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[2], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[3] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[3], 2][[1]] |
           
           country_assessment==unique(enough_n$country_assessment)[4] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[4], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[5] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[5], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[6] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[6], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[7] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[7], 2][[1]] | 
           
           country_assessment==unique(enough_n$country_assessment)[8] & 
           defined_populations_simplified %in% enough_n[enough_n$country_assessment==unique(enough_n$country_assessment)[8], 2][[1]])

# summary of n per variable
ind2_data_anova %>%
  group_by(country_assessment, defined_populations_simplified) %>% summarise(n=n())
## `summarise()` has grouped output by 'country_assessment'. You can override
## using the `.groups` argument.
# Two-way ANOVA with interaction effect of the method to define populations and the country
res.anova.extant<-aov(indicator2 ~ defined_populations_simplified * country_assessment , data=ind2_data_anova)
res.anova.extant
## Call:
##    aov(formula = indicator2 ~ defined_populations_simplified * country_assessment, 
##     data = ind2_data_anova)
## 
## Terms:
##                 defined_populations_simplified country_assessment
## Sum of Squares                        4.245618           2.775843
## Deg. of Freedom                              8                  5
##                 defined_populations_simplified:country_assessment Residuals
## Sum of Squares                                           0.489659 12.931021
## Deg. of Freedom                                                 3       300
## 
## Residual standard error: 0.2076136
## 55 out of 72 effects not estimable
## Estimated effects may be unbalanced
summary(res.anova.extant)
##                                                    Df Sum Sq Mean Sq F value
## defined_populations_simplified                      8  4.246  0.5307  12.312
## country_assessment                                  5  2.776  0.5552  12.880
## defined_populations_simplified:country_assessment   3  0.490  0.1632   3.787
## Residuals                                         300 12.931  0.0431        
##                                                     Pr(>F)    
## defined_populations_simplified                    3.00e-15 ***
## country_assessment                                2.34e-11 ***
## defined_populations_simplified:country_assessment   0.0108 *  
## Residuals                                                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since there significant differences, check which pairs are actually different (Tukey Honest Significant Differences):

Estimate indicator 3 (number of taxa with genetic monitoring squemes)

Indicator 3 refers to the number (count) of taxa by country in which genetic monitoring is occurring. This is stored in the variable temp_gen_monitoring as a “yes/no” answer for each taxon, so to estimate the indicator, we only need to count how many said “yes”, keeping only one of the records when the taxon was multiassessed:

Plot indicator 3 by country:

Relatively few taxa have genetic monitoring, but many have some sort of genetic study. Let’s check that, but first subset the ind3_data keeping only taxa assessed a single time, plust the first record of those assessed multiple times.

Sankey plot of genetic studies

Similar, but alluvial to show data colloring the flow by country

Highlights

Taxa and records by country

## [1] "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D" "#F8766D"

Indicator 3

Indicator 2

Plots by country, see forloop above.

Session Info for reproducibility purposes:

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] cowplot_1.1.1      viridis_0.6.3      viridisLite_0.4.0  alluvial_0.1-2    
##  [5] ggsankey_0.0.99999 ggplot2_3.4.1      stringr_1.4.0      utile.tools_0.2.7 
##  [9] dplyr_1.0.9        tidyr_1.2.0       
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.9        pillar_1.7.0     bslib_0.3.1      compiler_4.2.1  
##  [5] jquerylib_0.1.4  tools_4.2.1      digest_0.6.29    gtable_0.3.0    
##  [9] jsonlite_1.8.0   evaluate_0.15    lifecycle_1.0.3  tibble_3.1.7    
## [13] pkgconfig_2.0.3  rlang_1.0.6      cli_3.6.0        DBI_1.1.3       
## [17] rstudioapi_0.13  yaml_2.3.5       xfun_0.31        fastmap_1.1.0   
## [21] gridExtra_2.3    withr_2.5.0      knitr_1.39       generics_0.1.3  
## [25] vctrs_0.5.2      sass_0.4.1       grid_4.2.1       tidyselect_1.1.2
## [29] glue_1.6.2       R6_2.5.1         fansi_1.0.3      rmarkdown_2.14  
## [33] farver_2.1.1     purrr_0.3.4      magrittr_2.0.3   scales_1.2.0    
## [37] ellipsis_0.3.2   htmltools_0.5.5  assertthat_0.2.1 colorspace_2.0-3
## [41] labeling_0.4.2   utf8_1.2.2       stringi_1.7.6    munsell_0.5.0   
## [45] crayon_1.5.1